Spatio-temporal analysis of Dynamic Origin-Destination data using Latent Dirichlet Allocation. Application to the Vélib’ Bike Sharing System of Paris
نویسندگان
چکیده
1 This paper deals with a data mining approach applied on Bike Sharing System Origin-Destination 2 data, but part of the proposed methodology can be used to analyze other modes of transport that 3 similarly generate Dynamic Origin-Destination (OD) matrices. The transportation network inves4 tigated in this paper is the Vélib’ Bike Sharing System (BSS) system deployed in Paris since 2007. 5 An approach based on Latent Dirichlet Allocation (LDA), that extracts the main features of the 6 spatio-temporal behavior of the BSS is introduced in this paper. Such approach aims to summarize 7 the behavior of the system by extracting few OD-templates, interpreted as typical and temporally 8 localized demand profiles. The spatial analysis of the obtained templates can be used to give in9 sights into the system behavior and the underlying urban phenomena linked to city dynamics. 10 E. Côme, A. Randriamanamihaga and L. Oukhellou and P. Aknin 2 INTRODUCTION 1 The widespread use of smart card automated fare collection systems by transport operators can 2 help in innovative studies on human mobility. In fact, these fare collection systems collect a large 3 amount of data related to travels on the whole public transit networks in which they are deployed. 4 In this way, they can be viewed as passive sensors for human mobility. Advanced analysis of 5 the streams of trips produced by these systems can be used to give insights on human mobility, 6 allowing transport operators to provide best quality service. It may also help sociologists and 7 urban planners to apprehend the mobility patterns of users within the city. However, the volume 8 collected is often large, which raises challenges for their exploitation. Automatic algorithms able 9 to extract useful information from these sources has consequently become of great interest. 10 This paper deals with a data mining approach applied on Bike Sharing System Origin11 Destination data, but part of the proposed methodology can be used to analyze other modes of 12 transport that similarly generate Dynamic Origin-Destination (OD) matrices. The transportation 13 network investigated in this paper is the Vélib’ Bike Sharing System of Paris, deployed since 2007. 14 Its access system generates streams of detailed travel information, recorded as Origin-Destination 15 data. This work investigates the analysis of sizeable OD-matrices using an advanced statistical 16 model called Latent Dirichlet Allocation (LDA). This model, initially developed to process docu17 ment collections, was adapted to mine such OD-data in order to extract the main features behind 18 the spatio-temporal behavior of the BSS. The results provided by this model address the following 19 issues: 20 • Identify a reduced set of demand profiles, specific to such soft modes of transport. The 21 spatial analysis of the resulting patterns can be used to get a better understanding of the 22 underlying urban phenomena linked to city dynamics. 23 • Build links between the sociological, economical and geographical context of a city and 24 the usage of its BSS. BSS operators can both benefit from this kind of analysis to better 25 understand the system usage and learn how to improve the service quality of the existing 26 system. In the future, such knowledge can be transferred to cities aiming to incorporate 27 new BSSs. 28 • Get a better understanding of the problem of balancing load of bikes. One of the main 29 issues raised by BSS users in recent surveys is the availability of bikes: users are con30 fronted to empty stations when they want to rent bikes, and full stations when they return 31 them back. Redistribution of bikes, which consists of relocating them among the stations, 32 is then necessary in most BSSs to compensate the uneven demand of users. This issue 33 is traditionally addressed within the field of Operation Research, in which optimization 34 policies of bikes redistribution are developed. In this paper, we will focus on a data min35 ing approach aiming to give indicators on the imbalances of stations, which may be used 36 as inputs of advanced Operation Research algorithms. 37 The paper is organized as follows: Section 3 is devoted to related work conducted on 38 BSS data analysis. In section 4, we detail the data mining approach based on Latent Dirichlet 39 Allocation, which was used to achieve the BSS data analysis. The obtained usage patterns as well 40 as bike stations unbalances are also analyzed. In Section 5, contextual elements of the Bike Sharing 41 E. Côme, A. Randriamanamihaga and L. Oukhellou and P. Aknin 3 System of Paris are given. Then the results of the proposed methodology applied to the Vélib OD1 data are presented and discussed in Section 6, as well as new operational indicators, the obtained 2 usage patterns and the per-station bike imbalance of the BSS. Conclusion and perspectives are 3 finally presented to show how data mining approaches applied on new available data sources can 4 lead to innovative modeling and better understanding of urban mobility. 5 RELATED WORK 6 Several research studies have been conducted on BSS data over the past few years. They generally 7 arise from two main fields of research: Operation Research and Data Mining. The works from the 8 former field mainly concerns the optimization of the load balancing of bikes, often necessary to 9 compensate the uneven demand of bikes. This is usually performed with trucks that move some 10 bikes between the stations. The reader interested by this topic can refer to Benchimol et al., Chemla 11 et al., Nair et al., Lin and Yang (1, 2, 3, 4). 12 Data Mining approaches have been applied in various ways to BSS data. Two main topics 13 have been investigated: Clustering and Prediction. The Prediction topic focuses on developing 14 models able to forecast the usage of stations or, more generally, the behavior of the transportation 15 network in either the short term or the long term (see Froehlich et al., Borgnat et al., Kaltenbrunner 16 et al., Michau et al., Vogel et al. (5, 6, 7, 8, 9)). The Clustering topic aims to uncover spatio17 temporal patterns in the BSS usage, thus highlighting the relationships between time of day, loca18 tion and usage. This is classically done by partionning the set of stations into clusters of similar 19 patterns. However, one of the key differences among the researches concerns how the usage of 20 the BSS is described. A major part of the researches on BSSs use public data sampled from the 21 operator’s website which consist of station-occupancy statistics, such as the number of available 22 bicycles and free slots per-station along a day. The remaining part of the studies directly focuses 23 on the mining of anonymized and individual dynamic OD-trips provided by the BSS operators. 24 Using station occupancy data collected from the Bicing BSS of Barcelona, Froehlich et al. 25 (10) and Froehlich et al. (5) proposed methodologies that identify its main usage patterns and per26 formed a prediction of station usage within a prediction window ranging from 10 to 120 minutes. 27 Lathia et al. (11) investigated how a new user access policy in the London Barclays Cycle Hire 28 Scheme affected the system usage across the city, using both spatial and temporal analysis of sta29 tion occupancy data. Other approaches use trips data to analyze BSS usage, such as the recent 30 study of Borgnat et al. (12) on the Lyon Vélo’v BSS data in which different graphs are used to 31 extract similar profiles of usage (in terms of arrivals/departure count correlations) between pairs 32 of stations during weekdays and weekends, which lead to cluster the stations. Carried out on the 33 same BSS, another approach similarly based on a dynamical view of the transportation network 34 and proposed by Borgnat et al. (13) aims to uncover communities of stations that exchange bikes 35 in a preferential way: the activity between the stations was clustered using graph clustering algo36 rithm, and exhibited similar exchange dynamics. A statistical approach based on Poisson Mixture 37 model has been proposed by Randriamanamihaga et al. (14) in order to discover usage patterns of 38 the Vélib’ BSS data on the basis of clustering of flows. 39 Other researches using OD-trips data are proposed by Vogel et al., Vogel and Mattfeld 40 (9, 15) and aim to identify a reduced set of clusters of stations to get a better understanding of 41 the spatial and temporal causes of imbalances between BSS stations. The proposed methodology, 42 based on Geographical Business Intelligence process, was successfully applied to data collected 43 from the Vienna’s BSS Citybike Wien. It used feature vectors, i.e the per-hour and per-station 44 E. Côme, A. Randriamanamihaga and L. Oukhellou and P. Aknin 4 normalized number of incoming and outgoing trips recorded during weekdays and weekends, to 1 describe the stations. Three clustering algorithms (K-Means, Gaussian Mixture Model estimated 2 through the EM algorithm and sequential Information-Bottleneck (sIB)) are then compared. 3 The approach undertaken in this paper is based on Latent Dirichlet Allocation, a text4 categorization algorithm initially introduced in the seminal paper of Blei et al. (16). Conversely 5 to Montoliu (17), who uses LDA to analyze a BSS using occupancy data, this work deals with 6 OD-trips data. The second key differences concerns the formulation of the approach. In Montoliu 7 (17), as in most of the previous studies, the clustering-step partitions a set of stations whereas in 8 this paper, we aim to extract few global and recurrent demand profiles that describe the behavior 9 of the BSS. In order to give a clear overview of the system dynamic, post-processing tools are 10 furthermore introduced to analyze the results provided by LDA. 11 From a methodological point of view, topic models such as LDA are Probabilistic Gener12 ative Models that aim to recover the latent structure of a document collection. Although initially 13 developed to analyse text documents, Probabilistic Topic Models have been applied to other is14 sues: Farrahi and Gatica-Perez (18) aims to discover some location-driven routines using mobile 15 phone data , Huynh et al. (19) extracts daily human routines from wearable sensors and Niebles 16 et al. (20) analyzed trajectory and modeling semantic region on video scenes. These topic models 17 are used here to uncover the underlying mobility patterns, assuming the key idea that the usage 18 of a mode of transport can be summarized by a finite set of demand profiles, or routines, encoded 19 within typical OD-templates. The topic model involved in this paper is the LDA model, which 20 background is recalled in the next Section, as well as its re-interpretation in the context of mining 21 Dynamic Origin-Destination matrices. 22
منابع مشابه
Clustering the Vélib' dynamic Origin/Destination flows using a family of Poisson mixture models
Studies on human mobility, including Bike Sharing System Analysis, have expanded over the past few years. They aim to give insight into the underlying urban phenomena linked to city dynamics and generally rely on data-mining tools to extract meaningful patterns from the huge volume of data recorded by such complex systems. This paper presents one such tool through the introduction of a family o...
متن کاملModel-based count series clustering for Bike-sharing system usage mining, a case study with the Vélib’ system of Paris. CÔME ETIENNE and OUKHELLOU LATIFA
The bicycle sharing systems are increasingly numerous nowadays. These transportation systems generate sizable transportation data the mining of which can reveal the underlying urban phenomenons linked to city dynamics. This paper introduces a statistical model to automatically analyze bike sharing system trips data. This model will introduce a latent variable to partition the stations in terms ...
متن کاملService Network Design of Bike Sharing Systems
Bike sharing has recently enabled sustainable means of shared mobility through automated rental stations in metropolitan areas. Spatio-temporal variation of bike rentals leads to imbalances in the distribution of bikes causing full or empty stations in the course of a day. Ensuring the reliable provision of bikes and bike racks is crucial for the viability of these systems. This paper presents ...
متن کاملDiscovering Mobility Patterns on Bicycle-Based Public Transportation System by Using Probabilistic Topic Models
In this work, we present a new framework to discover the daily mobility routines which are contained in a real-life dataset collected from a bike-sharing system. Our goal is the discovery and analysis of mobility patterns which characterize the behavior of the stations of a bike-sharing system based on the number of available bikes along a day. An unsupervised methodology based on probabilistic...
متن کاملAutomatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation
Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...
متن کامل